Fellowships, Grants, & Awards

نویسندگان

  • Lars Kuepfer
  • Uwe Sauer
  • Pablo A Parrilo
چکیده

Background: Current approaches to parameter estimation are often inappropriate or inconvenient for the modelling of complex biological systems. For systems described by nonlinear equations, the conventional approach is to first numerically integrate the model, and then, in a second a posteriori step, check for consistency with experimental constraints. Hence, only single parameter sets can be considered at a time. Consequently, it is impossible to conclude that the "best" solution was identified or that no good solution exists, because parameter spaces typically cannot be explored in a reasonable amount of time. Results: We introduce a novel approach based on semidefinite programming to directly identify consistent steady state concentrations for systems consisting of mass action kinetics, i.e., polynomial equations and inequality constraints. The duality properties of semidefinite programming allow to rigorously certify infeasibility for whole regions of parameter space, thus enabling the simultaneous multi-dimensional analysis of entire parameter sets. Conclusion: Our algorithm reduces the computational effort of parameter estimation by several orders of magnitude, as illustrated through conceptual sample problems. Of particular relevance for systems biology, the approach can discriminate between structurally different candidate models by proving inconsistency with the available data. Background Systems biology is a framework integrating diverse disciplines such as molecular biology and genetics with mathematical modelling and simulation, where computational models assume an increasingly important role in recent years. Simulations are an essential element for quantitative understanding because the highly nonlinear behavior of complex biological systems can sometimes be counterintuitive [1,2]. System identification, which comprises both parameter estimation and structural network analysis in biological systems, can be extremely difficult since the governing principles and topological interactions are often not known [3]. The flood of data from novel highthroughput and genome-wide analyses further adds to the Published: 15 January 2007 BMC Bioinformatics 2007, 8:12 doi:10.1186/1471-2105-8-12 Received: 22 September 2006 Accepted: 15 January 2007 This article is available from: http://www.biomedcentral.com/1471-2105/8/12 © 2007 Kuepfer et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Page 1 of 11 (page number not for citation purposes) BMC Bioinformatics 2007, 8:12 http://www.biomedcentral.com/1471-2105/8/12 problem, and their diversity poses additional challenges for consistency testing and integration. Exact values of kinetic parameters (e.g., association or dissociation constants for protein-protein interaction) are very difficult to determine experimentally, because they are a function of the heterogeneous spatial and developmental cellular conditions. Most approaches to mathematical modelling of biological systems either avoid parametrization by focusing on static steady-state models of reaction stoichiometry or topology [4,5], approximate behavior and regulation through heuristic-based approaches [6,7], or reduce the solution space by applying enzyme kinetics such as in traditional Michaelis-Menten or linlog kinetics, respectively [8,9]. While this enables the detailed modelling of timedependent processes [10], the underlying quasi steadystate assumption a priori neglects the dynamics of enzyme complexes. Thus, in many cases the use of mass-action kinetics in the form of elementary reactions becomes necessary, for instance when some of the enzyme kinetics do not follow Michaelis-Menten [11], a steady-state of the system cannot be assumed [12], transport functions have to be described [13] or the network structure is uncertain [14,15]. The systems of equations that arise in time-dependent models based on chemo-physical kinetics of any kind are usually hard to parameterize, since their components are tightly coupled and there is limited information about the time evolution of the concentrations of all the species. Several approaches for efficient system identification have been developed, ranging from reduction of system dimensionality [16] to decoupling of the differential equations [17]. The actual parametrization algorithm, here referred to as the wrapping algorithm, is of particular importance [18] and was for example highlighted in a comparison between gradient-based methods and genetic and evolutionary algorithms [19]. All of these approaches however consider single, isolated points in the parameter space; this can be very time-consuming due to the iterative nature of the parametrization process [19] and furthermore cannot guarantee that the algorithm finds possible solutions. We present here a conceptually novel approach based on techniques from numerical convex optimization, in particular semidefinite programming (SDP) [20], that can efficiently partition the parameter space into feasible and infeasible regions. Basically, our approach allows to analyze sets of mass action kinetics, i.e., ordinary differential equations (ODEs) at steady state, under consideration of additional algebraic equality and inequality constraints (e.g., mass balances or formation rates). It is thus possible to simultaneously consider all the available information during the parameter estimation itself, rather than checking consistency a posteriori as in most current methods. Previous applications used SDP either for rational experimental design based on covariance analysis [21] or for the derivation of barrier certificates by construction of surrogate models [22]. In contrast, the algorithm presented here needs no problem reformulation but is based directly on the original parameter estimation problem. By exploiting convexity in the search for feasible steady state concentrations, our methods enable a multi-dimensional perturbation analysis for sets of parameters. Our algorithm therefore provides rigorous proofs for the classification of the parameter space into feasible and infeasible regions, thus reducing the number of points needed for consideration by several orders of magnitude. Methods Steady state analysis The natural starting point in the analysis of dynamical systems in biology is the determination of a steady state equilibrium of the model, since it represents the reference point for any kind of perturbation introduced to the system. It is therefore of utmost importance that the species' concentrations at an equilibrium point, which is henceforth synonymously referred to as steady state, are realistic and validated against as much data as possible. In the present study we consider models in the form of mass action kinetics, hence all reaction rates are direct functions of the concentrations y and the kinetic constants k. At an equilibrium point, the corresponding system of n ODEs reduces to a set of polynomial equations fi (k, y) = 0, i = 1, ..., n. (1) Besides the equations above, the state concentrations also have to satisfy various kinds of constraints based on experimental data, e.g., formation rates or overall mass balances. Since these will in practice inevitably include certain errors, inequality constraints also arise naturally. Consider for instance the general mass balance equations, here described by a matrix M, which when multiplied with the array of state variables y satisfy M·y = b, M ∈ m × n, (2) where b ∈ m is the total amount of each species. Hence, Eqn. (2) represents overall mass balances while Eqn. (1) comprises the mass action kinetics, i.e., the formation and consumption rates of each single species. Since components can exist both unbound or bound in a complex, respectively, there are in general fewer mass balance equations than differential equations (n ≥ m). An experimental Page 2 of 11 (page number not for citation purposes) BMC Bioinformatics 2007, 8:12 http://www.biomedcentral.com/1471-2105/8/12 error ε can be taken into account by relaxing the exact mass balance equations to inequalities where (1 ε)·b ≤ M·y ≤ (1 + ε)·b. (3) Thus, the parameter estimation problem can be rewritten as a nonlinear optimization that depends on the steady state concentrations y and the set of kinetic parameters k. Here, f0 (k, y) is an arbitrary objective function, e.g., the overall model error, the equality constraints fi (k, y) represent the steady state condition and g (k, y) is a set of inequality constraints, e.g., the mass balances. Semidefinite programming In the following, we will introduce a reformulation of the generalized parameter estimation problem (4) based on semidefinite programming (SDP). SDP is a specific kind of convex optimization problem [20,23,24], with very appealing numerical properties. An SDP problem corresponds to the optimization of a linear function subject to a matrix inequality. The only prerequisite for the applicability of the SDP is a quadratic (or polynomial) representation of the original set of equalities and inequalities (4), which allows a reformulation/relaxation in terms of symmetric matrices. For clarity of exposition, we focus on the case where the fi are quadratic and the gi are linear, although our methods extend to the fully polynomial case; see [25] for details. For the SDP relaxation we define new variables x, X in terms of the original state variables y by: Based on these definitions, the original problem (4), including both steady state equations and data consistency inequalities, can be rewritten as: Here, the symmetric matrix Qobj defines an objective function (e.g., the identity matrix), that selects a specific solution out of the possibly many equilibria. The symmetric matrices Qi correspond to the set of ODEs at steady state (Eqn. (1)) and the matrix L to linear inequality constraints derived, for instance, from approximate mass-balance equations (Eqn. (3)). Note, that in general the matrices Qi are a function of the set of (generally unknown) rate parameters kj, while L, which represents the mass balances, is not. The basic convex relaxation described in the Appendix then yields the following SDP relaxation of (6): where Tr is the trace operator, which adds up the diagonal elements, and the inequality in the last line indicates that the matrix X must be positive semidefinite, i.e., all its eigenvalues should be greater than or equal to zero. The vector e1 is an all-zero vector, except for its first entry, which equals one to enforce X11 = 1. The set of feasible solutions, i.e., the set of matrices X that satisfy the constraints, is always a convex set. Recall that the matrix X as defined in Eqn. (5) is by construction a rank one matrix. This rank condition, however, is not guaranteed for the X obtained from the optimization, and thus this property has to be checked independently after a solution to the SDP is computed. This is necessary because the relaxed problem formulation (7) is less strict then the original form (6), hence the set of feasible solutions becomes larger. Note, that introduction of additional nonlinear constraints helps to reduce the set of " false positive" solutions, an aspect discussed in more detail in an additional section below. Finally, in the particular case of Qobj = 0, the problem reduces to whether or not the inequality can be satisfied for some matrix X. In this case, the SDP is referred to as a feasibility problem, where we are interested in proving the mere existence of solutions rather than finding any particular one. Dual SDP problems The convexity of SDP has made it possible to develop sophisticated and reliable analytical and numerical methods to solve them [20]. A very important feature of SDP problems, from both the theoretical and applied viewpoints, is the associated duality theory (see also Appendix). For every SDP of the form (7) (usually called the primal problem), there is another associated SDP, called the dual problem, which can be derived via Lagrangian duality: min ( , )

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Productivity outcomes for recent grants and fellowships awarded by the American Osteopathic Association Bureau of Research.

The objective of the present study was to evaluate productivity outcome measures for recent research grants and fellowships awarded through the American Osteopathic Association (AOA) Bureau of Research. Recipients of grants and fellowships that were awarded between 1995 and 2001 were contacted by mail, e-mail, or telephone and asked to provide information about publications, resulting grant awa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 112  شماره 

صفحات  -

تاریخ انتشار 2004